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Abstract 

We study multi-parameter Tikhonov regularization, i.e., with multiple penalties. Such models 
are useful when the sought-for solution exhibits several distinct features simultaneously. Two choice 
rules, i.e., discrepancy principle and balancing principle, are studied for choosing an appropriate 
(vector-valued) regularization parameter, and some theoretical results are presented. In particular, 
the consistency of the discrepancy principle as well as convergence rate are established, and an a 
posteriori error estimate for the balancing principle is established. Also two fixed point algorithms are 
proposed for computing the regularization parameter by the latter rule. Numerical results for several 
nonsmooth multi-parameter models are presented, which show clearly their superior performance over 
their single-parameter counterparts. 
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1 Introduction 

In this paper, we are interested in solving linear inverse problems 

Kx = y 5 , (1) 

where y 5 € Y is a noisy version of the exact data y^ = Kx^ € Y with 5 2 — (p(x\y s ) being the noise level, 
the operator K : X — > Y is bounded and linear, and the spaces X and Y are Banach spaces. 

Typically, problem ([I]) suffers from ill-posedness in the sense that a small perturbation in the data 
might lead to large deviations in the retrieved solution, and this often poses great challenges to their 
stable yet accurate numerical solution. Usually, a regularization strategy is applied to find a stable 
approximate solution [18, 6 . The most widely adopted approach is Tikhonov regularization, which seeks 
an approximation x^ to problem (JlJ by minimizing the following Tikhonov functional 

J n (x) = ^>(x,y 5 )+r]-ip(x). 

Here the functionals <f> and ij) represent data fidelity and (vector-valued) penalty, respectively, and r\ ■ 
ip(x) is the dot product between r) = [rjx, ■ ■ ■ , Vn) T and ip(x) — (ipi(x), . . . , ip n (x)) T , i.e., r\ ■ xp{x) = 
Y^i=i Vi^iix)- Common choices of the fidelity 4>(x,y s ) include \\Kx — y S \\ 2 L i, \\Kx — y s \\h 1 an( i f(Kx — 
y 5 In Kx), which are statistically well suited to additive Gaussian noise, Laplace (impulsive) noise and 
Poisson noise, respectively. The penalties ipi are nonnegative, convex and (weak) lower semicontinuous. 
The typical choice includes II^H^, ||^||? P , IMIh™ an d \x\tv e ^c. The regularization parameter vector rj 
compromises fidelity with penalties. 

The use of multiple penalties, henceforth called multi- parameter regularization, in the functional J v is 
motivated by practical applications which exhibit multiple/multiscale features. We just take microarray 
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data analysis for an example. Here the number of data is often far less than that of the unknowns. A 
desirable approach should select all variables relevant to the proper functioning of gene network. The 
conventional £ 2 penalty tends to select all variables, including irrelevant ones, since the resulting estimate 
has almost no nonzero entries. To remedy this issue, i 1 penalty has been suggested as an alternative. 
However, the i 1 approach delivers undesirable results for problems where there are highly correlated 
features and all relevant ones are to be identified in that it tends to select only one feature out of the 
relevant group instead of all relevant features of the group [20] , thereby missing the groupwise structure. 
Zou and Hastie [5D] proposed the elastic-net by incorporating the 1 2 penalty into the i 1 penalty, in the 
hope of retrieving the whole relevant group, and numerically demonstrated its excellent performance for 
simulation studies and real-data applications. Such multiple/multiscalc features appear also in many 
other applications, e.g., image processing [HI [17], electrocardiography [3], and geodesy [19]. 

A number of experimental studies [3J 1201 119j have shown great potential of multi-parameter models 
for better capturing multiple distinct features of the solution. However, a general theory for such models 
remains largely under-explored. There are several attempts on various aspects, e.g., parameter choice, 
convergence and statistical interpretation fTJ [2 H] \§\ El H2 US] of multi-parameter regularization. For 
instance, Lu et al |12) discussed the discrepancy principle using Hilbert space scales, and derived some 
error estimates, but the parameter is vastly nonunique and it remains unclear which one to use. They 
also adapted the model function approach to choose the regularization parameter, but the underlying 
mechanism remains unclear. Jin et al [5] recently investigated the properties, e.g., consistency and error 
estimates, of clastic-net for asymptotically linear coupling between the two terms, and proposed two 
active-set type methods for efficient numerical realization. 

This paper aims at developing some theory for such models in a general framework. The value 
function and its properties are first derived. Then two parameter choice rules, i.e., discrepancy principle 
and balancing principle, are studied. The consistency and convergence rates are established for the 
former. The balancing principle can be derived from the Bayesian inference [10] . and it was generalized 
in [jj]. The principle balances the penalty with the fidelity term. The variant under consideration here is 
solely based on the value function, and does not require a knowledge of the noise level. An a posteriori 
error estimate is derived, and two efficient numerical algorithms are also proposed. 

The rest of the paper is structured as follows. In Section[2] we investigate the value function and derive 
some properties, e.g., monotonicity, concavity, asymptotic and especially differentiability. In Section [3] 
we investigate two parameter choice rules, i.e., discrepancy principle and balancing principle, and discuss 
their theoretical properties. In addition, two fixed point algorithms for the efficient numerical realization 
of the balancing principle are proposed. Numerical results for several examples are presented in Section 
[4] to illustrate the efficiency and accuracy of the proposed approaches. Finally, we conclude the paper 
with several future research topics. 

Notation Let x^ be a minimizer to the functional J v (x), and A4 V be the set of minimizers. For vectors 
r] £ K™ and f] g K™ , we denote by r\ < r) if rji < fji VI < i < n. 



2 The value function and its properties 

In this section, we collect important properties of the value function F(rf) defined by 

F( V )= inf J v (x), (2) 

where the set Q a d stands for a convex constraint. Here, the existence of a minimizer x^ to the functional 
J v is not a priori assumed. Provided that a minimizer x^ does exist, we have F(r)) = ^(xt). The value 
function F will play an important role in developing a balancing principle, see Section |3.2| The results 
presented below generalize those for the single parameter [7], and the proofs are similar and thus omitted. 
A first result shows the continuity and concavity of F. 
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Lemma 2.1. The value function F(rf) is monotonically increasing in the sense F(f)) < F(r/) if i) < r\. 
and it is concave. 



Remark 2.1. Lemma 2.1 does not require the existence of x £ Q a d that achieves the infimum of J v . 
The results are also true for nonlinear operators and in the presence of convex constraint Qad- 

Next we examine the properties of the value function F more closely. Recall first one-sided partial 
derivatives d i F are defined by 

0-FM = lim n^-Fjy-he,) 
8+Fir,) = lim Fin + hed-F^) 

where ej is the zth canonical basis. 

The next result shows some properties, i.e., existence, nonnegativity, monotonicity and (left- and 
right-) continuity, of the one-side partial derivatives dfF. The properties follow directly from Lemma 

cm 

Lemma 2.2. For any r] > 0, there hold 
(i) The one-sided partial derivatives di F(rf) exist, and d i F(rj) > 0; 
(ii) For any h > 0, there holds < d+F(r] + he{) < drF(rj + he,) < d+F(r]) < drF(ij); 
(Hi) d~F(r))= lim d~F(r] — hei) anddfF(rf) = lim df F '(ij + hei) . 

Remark 2.2. The partial differentiability of F in the i-th direction at T] guarantees the continuity of 
dfF at this point. Indeed, the monotonicity of dfF and the left continuity of df F yield the inequalities 

dfF(rj) = lim d+F(n + he,) < lim d.rFOn + he,) < d~F(r]). 
h^o+ h-y0+ 

Now suppose F is differentiable at r\, i.e., d^F(r]) = dfF(r/). Then from the inequalities it follows that 

ft lim + d7F(r) + he,) = drF(rj), 

which shows the continuity of F at rj. Similarly it follows that df F is continuous at r\. 

The asymptotic behavior of F(r]) is useful for designing numerical algorithms. 
Proposition 2.1. The following asymptotics of F hold 

lim F(rj) = inf (j>(x,y s ) and lim rjidf F (r/) = . 

|f)|^0 x£Q ad |T7| — !-0 

The partial derivatives dfF are closely connected to the fidelity <fi and penalty ip under the assumption 
of existence of a minimizer, i.e., the set is nonempty. This is guaranteed by: 

Assumption 2.1. The functionals <f> and ?pi satisfy: 

(i) For any sequence {x n } n C Q a d such that <f> and ipi for all 1 < i < n are uniformly bounded, there 
exists a subsequence {x nk }k which converges to an element x* € Q a( i in the t -topology. 

(ii) <fi and ipi are lower semi- continuous with respect to r-convergent sequences, i.e., if a subsequence 
{x„} n converges to x* € Q a d in r-topology, then 

§(%*) < liminf (f>(x n ) and ipi(x*) < liminf ipi(x n ). 
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In case that the set Mr, contains multiple elements, there might exist distinct xf., xf. € M v such that 
F(r 1 )^^(xt l ,y S )+V-i>(4) = H^y S )+ri-'4>(Xr 1 ) but d>(x s v , y s ) ± <f>(x 5 n , y s ), 

i.e., the functions 4>(xL, y s ) and iplx^) are potentially multi-valued in rj. 
A first relation between ipi and d i F is given by 



Lemma 2.3. Let Assumption 2.1 be fulfilled. Then for any x s v G Mr,, there hold 

d+F( V ) < ^(4) < d-F( V ) i = l,...,n, 

n n 

F(V) - £ VidiF( V ) < 0(4, y s ) < F(rj) - £ Vl d+F( V ). 

i=i i=i 

An immediate consequence of Lemma |2.3| is: 

be fulfilled. If diF{r}) exists r/ for all i, then ipi(xt) and (^(x^y 6 ) 



2.1 



Corollary 2.1. Let Assumption 
are single valued at rj and 

5 i F(r?)=Vi«) and F{ V ) - V ■ dF(ri) = <t>(x s v , y 5 ) Vz* e M v . 

More precisely, the partial derivatives df 1 can be expressed by ipi as follows. 



Theorem 2.1. Let Assumption 2.1 hold. Then for any rj > and every i, there exist x~[ , x i S Mr, such 
that 

^(xf)=dfF(r,) and ^{xj) = drF{r,). 



Theorem 2.1 in conjunction with Lemma 1 2 . 3| implies the following corollary. 



Corollary 2.2. Let Assumption 2.1 hold. Then 
(i) There exist xf,x~ € Mr, such that ^pi(x^) — min ipi{x) and ipi(x^) — max tpi(x). 

(ii) If ijji(xf l ) = ^pi(xf l ) for all x*,:c* G Mr, for all rj > 0, then diF(rf) exists and it is continuous. 

The last result gives a sufficient condition for the differentiability of the value function F. It plays an 
important role in especially designing an efficient algorithm for certain choice rules, by e.g., Morozov's 
principle and balancing principle |llj . 

Theorem 2.2. Assume that the minimizer of the functional J v is unique at rj > 0. Then the derivatives 
{diF(r))}i exist and are continuous at rj. In particular, F is differ entiable at r). 



3 Parameter choice rules 

In this section, we discuss two choice rules, i.e., discrepancy principle |16L 110] and balancing principle [7], 
for multi-parameter models. For notational simplicity, we shall restrict our attention to the case of two 
penalty terms. 

3.1 Discrepancy principle 

Here we investigate the discrepancy principle due to Morozov [16] for multi-parameter regularization. We 
shall assume a triangle-type inequality for the functional <f>. 

Assumption 3.1. The functional (j)(x, y) vanishes if and only if Kx = y, and satisfies an inequality 
4>{x,y) < c(4>(x' ,y') + 4>{x,y')) for some constant c and any x' with Kx' = y. 
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The discrepancy principle determines an appropriate (vector-valued) regularization parameter rj by 

4>(x s v ,y s ) = CnS 2 (3) 

for some constant c m > 1. The rationale of the principle is that the solution accuracy in terms of the 
residual should be compatible with the data accuracy (noise level). 

Theorem 3.1. Let Assumptions \2~l\ and \3.1\ be satisfied and the operator K be injective. Then for any 
rj = rj(5) satisfying ^ and c < < C\ for some Cq,C\ > 0, there holds lim^^o x v — x ^ ^ n T-topology. 

Proof. The minimizing property of x^ implies 

4>(x s v , y 5 ) + v *l>(x 5 v ) < <K*\if) + v ■ iK^) 

From the discrepancy equation ([3]), we deduce 

v ■ <K<) < v ■ (4) 

Therefore, either tpi(xf l ) < ipi(x^) or ^{x^) < ^2(2^) holds. Now the assumption cq < ^j|y < c± 
implies that the sequence {ipi(x^), i = 1, 2}$ is uniformly bounded. Hence the coercivity of the functional 
indicates that the sequence {x^}s is uniformly bounded. Thus there exists a subsequence, also denoted 
by {x^}s, and some x* , such that 

a;* — > x* in r-topology. 



The r-lower semicontinuity of the functional 4> and Assumption |3.1| yields 

< 4>{x*,y^) < c\\uYmi{(j}{x\y s ) + <p{x s ,y 5 )) < liminf c(l + c rn )S 2 = 0. 

S — ^0 5 — ^0 

In particular, <f>(x*,y*) — 0, i.e., Kx* = yK This together with the injectivity K implies x* — x\ Since 
every subsequence has a subsubsequence converging to x\ the whole sequence converges to xK □ 

Remark 3.1. The condition cq < ^Ify < c± ensures the uniform boundedness of both penalties, and thus 
we can utilize the lower- semicontinuity of the functionals to arrive at the desired t -convergence. 

Theorem 3.2. Let Assumptions 



2.1 



3.1 



hold. If a subsequence {rj(5)}s converges andfj = lim ^jyy 



> 



0. Then it contains a subsequence t -converging to an fjipi + ^-minimizing solution and 



lim +^(4)) = Wl>l&) +V*(* t ). 



Moreover, if the fjipi + ^-minimizing solution is unique, then the whole subsequence t -converges. 



Proof. By repeating the arguments in Theorem |3.1[ we deduce that there exists a subsequence, also 
denoted by {x^}s, and some x* , such that 

x & v — > x* in r-topology. 

and by the r-lower-semicontinuity, we have 4>{x* ,y^) = 0. By virtue of lower semicontinuity of the 
functionals and inequality Q, we deduce 



whir*) + M*') < liminf (^M4) + W<) 



^hmsupf^V^O + ^K) 



< lim 



-5->-o \r) 2 (6) 
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which together with the identity 4>(x* , y') — implies that x* is an fjipi + ^-minimizing solution. The 
desired identity follows from the above inequalities with x* in place of x^ . The whole sequence convergence 
follows from the standard subsequence argument. □ 

In Theorems 3.1 and 3.2 we have assumed the existence of a solution rj to equation This is 



guaranteed if the Tikhonov functional has a unique minimizer, see Theorem 2.2 



Theorem 3.3. Assume that has a unique minimizer for all r\ > 0, lim^i^g 4>( x ^ V ) < c m 5 , and 



there is a sequence {f] n } such that limn^^ 4>{x s ,y s ) > c m S 2 . Then there exists at least one solution to 



Proof. By Theorem 2.2 and Lemma 2.1 the uniqueness of a minimizer to J v for all r) > implies that 
the function <p(x^, y ) is continuous in rj. The desired assertion follows from the continuity directly. □ 

Lastly, we present an error estimate in case of Y being a Hilbert space and <p(x,y s ) — \\Kx — y & \\ 2 
and convex penalties ip. We use Bregman distance to measure the error. Denote the subdifferential of a 
functional ip(x) at x^ by dip{x^), i.e., dip(x^) = {£ E X* : ip(x) > ij){x^) + {£,x — x^)Vx e X}, and the 
Bregman distance by for any £ G di[>(x') 

d^(x, x*) := ip{x) — ^(x^) — (£, x — x*). 

Theorem 3.4. IfY is a Hilbert space and the exact solution x^ satisfies the source condition: r&nge(K*)(l 
dil>i(x^) n dip2(x i ' ) 7^ 0- Then for any r/* solving there exists some i and £j € dipiix^) such that 

d £i (x s v *,x^) < CS. 

Proof. By the minimizing property of x^, , we have 

0(x*„ y s ) + rf ■ ^(i*.) < cj>(x\y 5 ) + rf ■ ^) < S 2 + V * ■ iftx*). 

The definition of the discrepancy principle indicates 

V* ■ ^04* ) < »7* ■ ipix^). 

Consequently, we have that there holds tp^x 8 ^,) < ipi(x^) for either i — 1 or i — 2. Therefore, by the 
source condition, for some € range(if *)r\dipi(x^) or equivalently & = K*Wi for some source representer 
Wi, and the Cauchy-Schwarz inequality, we deduce 

d Si (x s v , ,xi) = ) - Vi(^) - fe, - aj f ) < < - a;t) 

= a£. -a?t) = Jif(a;^ -a;t)) 

< |K|III#(4* -x*)\\ 

< \\ Wi \\ (\\Kx s v , -y s \\ + \\y s - Kx*\\) < (1 + 

This shows the desired estimate. □ 



The source condition in Theorem |3.4| can be hard to argue. Alternatively, we can have another 
convergence rates result under a seemingly less restrictive assumption. 

Theorem 3.5. If Y is a Hilbert space and the exact solution x^ satisfies the source condition: for any 
t e [0, 1], there exists Wt such that K*wt = £t £ d{tip\(x^) + (1 — t)ipn(x^)). Then for any rj* solving 
and letting t* = 7? » ($)+^* (g) > ^ e following estimate holds 

d^(x 5 v „x^<CS. 
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Proof. By the minimizing property of x^, , we have 

t*Vi«0 + (1 - **)^a(4*) ^ + C 1 - OV*^)- 

Therefore, by the source condition, for some 6* € d{t*ip{x^) + (1 - t*)tp(x^)) and to t . € Y such that 
£t» = K*Wf, and the Cauchy-Schwarz inequality, we deduce 

d Ct . = (t*Vi(^.) + (1 - ~ + (1 - O^C**)) - " ^) 

< -(6* , <. - a:^ = -<A"^. , i*. - it) 

= -<«;*., tf(a£. -ajt)) < \\ m , \\\\K (x 5 v , - x*)\\ 

< \\w t , || - /|| + \\y s - Kx*\\) < (1 + Cm )|| Wt ,||<5. 

This shows the desired estimate. □ 

Remark 3.2. In the practical applications of the discrepancy principle, one needs to find the solution of 
a nonlinear equation in r\. The uniqueness of a solution to equation ^ is not guaranteed, and additional 
conditions need to be supplied for definiteness. Lastly, we would like to mention that the principle can be 
efficiently realized by the model function approach Ul)j . 



3.2 Balancing principle 

The discrepancy principle described earlier requires an estimate of the noise level 6, which is not always 
available in practical applications. Therefore, it is of great interest to develop heuristic rules that do not 
require this knowledge. One such rule is the balancing principle, for which there are several variants, see 
[8] for details. The principle can be derived from the augmented Tikhonov (a-Tikhonov) regularization 
[10] . which admits clear statistical interpretations as hierarchical Bayesian modeling. In particular, it 
provides the mechanism to automatically balance the penalty with the fidelity, see also Remark |3.4| The 
variant under consideration is due to [7J , and has demonstrate very promising empirical results for several 
common single-parameter models [7 . Finally we remind the balancing principle discussed here should 
not be confused with the principle due to Lepskii which is sometimes also named balancing principle |15j 
and does require a precise knowledge of the noise level. 

First we first sketch the a-Tikhonov regularization approach. For multi-parameter models, it can be 
derived analogously from Bayesian inference [TOMB], and the resulting a-Tikhonov functional J(x, r, {Aj}) 
is given by 

J{x,t, {A;}) = T(f)(x,y 5 ) + A • ip(x) + ^(AAj - lnA;) + (3qt - a lnr, 

i 

which maximizes the posteriori probability density function p(x, r, {Ai}|y A ) oc p(y s \x, r, {A^}) p(x, r, {Aj}) 
under the assumption that the scalars {A,} and r have the Gamma distribution with known parameter 
pairs (ai,fli) and (ao,(3o), respectively. Let r\% = Then the necessary optimality condition of the 
a-Tikhonov functional is given by 

'a;*=argmin {(f>(x, y 5 ) + r\ ■ i])(x)} , 



< 1 + 

_ «0 

4 T ~ (f>(x s TI ,y s ) + /V 

Upon assuming ctfj = a and = f3 for simplicity and letting 7 = 931 , then we have the following system 
for (**,»/) 

= axgnrin {<f>(x, y 6 ) + 77 • tp(x)} , 

= 1 0«,y*)+fl) (5) 
m 7 ' 
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Next, we give the promised balancing principle. The multi-parameter counterpart of the balancing 
principle given in [7] consists of minimizing 



F 2 +T(r?) 

mm 



where the constant c 7 = ^_ i J 2 )i+ 2 • We n °t e that this constant c 7 can be quite arbitrary, except for 
comparison with the criterion \I/ 7 defined next. Another variant of the balancing principle reads 



^) = 0(<,yTV>iK)V^K), 

which generalizes a criterion due to Reginska [6]. 

The relation between <I> 7 and ^ 7 is made explicit in the following result. 

Proposition 3.1. Let the value function F be twice continuously differentiable, diF(i = 1,2) do not 
vanish, and the Hessian V 2 _F be nonsingular. Then the criteria $ 7 and \l/ 7 share the set of critical 
points, which are the solutions to the system 



7^1 (4) = 7%V> 2 «) = 0«,y 5 )- 

Proof. Setting the first-order derivatives of the criterion $ 7 to zero gives 



(6) 



V$ 7 (7y) 



F 1+ ">(rj) 

V1V2 



(2 + 7 )Vi«)-£ 
(2 + 7 )V 2 «)-^ 



= 0. 



This together with Lemma 2.3 gives (2 + 7)77^(2;*) = F, i = 1,2. Consequently, 771^1 (ad) = r^V^^r,) 



and thus system ([6]) holds. Meanwhile, by setting the first-order derivatives V*& 7 (?7) of the the criterion 



^ 7 to zero and noting Lemma 2.3 we get 



«, 2/ 5 )^ 1 V 2 F 



MA) 



-7^(0 + </>«,y 5 ) 
-7^2(4) + <^(<,y 5 ) 



By the assumption that the Hessian V 2 F is nonsingular and ijii(x^.){i = 1, 2) do not vanish, we arrive at 

i.e., system Q. This concludes the proof. □ 

Remark 3.3. Criterion $ 7 makes only use of the value function F(r)), not of the derivatives of F(rf), 
which can be potentially multi-valued in case that the functional J v has multiple minimizers. In contrast, 
the value function F(j]) is always continuous, see Lemma 2.1 and thus the optimization problem of 
minimizing <I> 7 over any bounded regions is always well-defined. For models with potentially nonunique 
minimizers, criterion 4" 7 and balancing principle, i.e., equation are ill-defined, and the corresponding 
minimization formulations can be problematic. The criterion <I> 7 is advantageous then. 

Remark 3.4. Balancing principle is named after system it attempts to balance the fidelity with the 
penalties with the parameter 7 being the relative weight. Comparing ([6| with ^ shows clearly the intimate 
connections between the a- Tikhonov approach and the balancing principle: the a- Tikhonov approach builds 
in the principle automatically, and consequently the hierarchical Bayesian modeling is also balancing. 
Finally, we would like to remark that the balancing idea has been developed from other perspectives, see 
J21 Section 2.2] for details. 

The relation between the criteria <£> 7 and \I/ 7 is made more precise in the following theorem: \I> 7 always 
lies below $ 7 , and thus at each local minimum, $ 7 is sharper and numerically easier to locate. 
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Theorem 3.6. For any 7 > 0, the following inequality holds 

* 7 (r?) < * 7 (f|), Vtj > 0. 
T7ie equality is achieved if and only if the balancing equation Q is verified. 

Proof. Recall that for any a,b,c > and p,q,r > 1 with i + | + 1 = 1, there holds the generalized 
Young's inequality abc < ^7 + y + ^r, with equality holds if and only if a p = b q = c r . Let p = 2 ^ L 

and q = r = 2 + 7. Applying the inequality with a = (f> 2 +-i (771772) 2 < 2 +~') , 6 = (7^1) 2+7 (?7i?72 ) 2(2+7> and 

1 _-. 1 
c = (7^2) 2+ "' (?7i t/2) 2(2+7) gives 

^^to)"^^ < = ^7^- 

2 + 7 (?7i?72) 2 2 + 7 (V1V2) 2 

Hence 

^^vr 7 ^ 7 < 



,^ ,^ / 7 7 + 2 



Therefore, we have 



7 7 F 2+ ^(r]) 
(2 + 7) 2 +7 77^2 
The equality holds if and only if a p = b q = c r , i.e., 

Simplifying this gives the balancing equation (JsJ) . This concludes the proof. □ 

The following result shows an interesting property of a minimizer to Criterion <£> 7 . 

Theorem 3.7. At a local minimizer rj* to the function $ 77 the partial derivatives of F{rf) exist. 

Proof. Assume that the assertion is not true, i.e., r]* is a discontinuity point of at least one ipi- Since r]* 
is a local minimizer, we have 

<9-$ 7 (77*) < and <9+$ 7 (?7*) > 0. 
In particular, this implies that d^~$y(r)*) — d~<frj(ri*) > 0. Note that 

df^(v*) d-<W) = (2 + 7)c 7 J-F-y(r,*) [d+F(r,*) - d-F(r,*)] 

71 </2 

and consequently dfF(r)*) — e?r F{rf) > 0. This is in contradiction with the fact that at a discontinuity 
point 77*, dfF{rf) — d~F(r]*) < by the monotonicity of the function tpi(x^) with respect to 77,. □ 

Now we present an a posteriori error estimate for Criterion $ 7 when Y is a Hilbert space and </>(x, y s ) = 
\\Kx-y s \\ 2 and convex penalties. The proof will be presented elsewhere, and we also refer to [7]. Theorem 



3.8 provides one a posteriori way to check the automatically determined (vector-valued) regularization 



parameter, and partially justifies the criterion theoretically. 

Theorem 3.8. Let the following source condition be satisfied for the exact solution x' : for any t £ [0, 1] 
there exists a w t G Y 

&e0(t^(zt) + (l-t)^ 2 (a:t)) and £ t =K*w t . 
Then for every 77* determined by the criterion $ 7 , there exists some constant C such that 

F 1+1 2(5e 



dzAA>^) < C (W|| + fiTffS) max (W, 



where e = (1, 1) T , 5* = \\Kx^ m - y 5 \\, and t* = 77*7(77* + rj*). 

Finally, we present two algorithms, see Algorithms [I] and [2j for computing a minimizer of Criterion 
<I> 7 . The algorithms are of fixed point type, and can be regarded as natural extensions of the fixed point 
algorithm in [7]. Practically, the algorithms merit a very steady and fast convergence. 
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Algorithm 1 Fixed point algorithm I. 



Choose 7, rf and set k = 0. 
repeat 

Solve for x k+1 by the Tikhonov rcgularization method 



x k+1 = arg min {<p(x, y 5 ) + rj k ■ ip(x) } . 
4: Update the regularization parameter r/ k+1 by 

k+l = 1 cj ) (x k + 1 ,y s )+r ] k ^ 2 (x k+1 ) 
Vl 1 + 7 Vi(z fc+1 ) 

1 (f)(x k+1 ,y s )+n k ^ 1 (x k+1 ) 



>h 



fc+i 



1 + 7 ^2(x fe+1 ) 



5: until A stopping criterion is satisfied. 



Algorithm 2 Fixed point algorithm II. 



Choose 7, rf and set fc = 0. 
repeat 

Solve for x k+1 by the Tikhonov regularization method 



x k+1 = arg min [<f>{x, y s ) + rj k ■ i])(x) } 
4: Update the regularization parameter rj k+1 by 



>h 



7 Vi( a:; ' £+1 ) 



??2 7 ^2(^ +1 ) 
5: until A stopping criterion is satisfied. 



4 Numerical experiments 

This part presents numerical results for three examples, which are integral equations of the first kind with 
kernel k(s,t) and solution x'(t), to illustrate features of multi-parameter models. The discretized linear 
system takes the form Kx^ = yt. The data yt is corrupted by noises, i.e., yf = y\ +max i {|y||}e^ i , where 
£i are standard Gaussian variables and e refers to the relative noise level. The fidelity 4> is taken to be 
the standard least-squares fitting. We present only the numerical results for Algorithm II, as Algorithm I 
exhibits similar convergence behavior. The initial guess is always taken to be 1 x 10 -3 , and it is stopped 
if the relative change of rj is smaller than 1.0 x 10~ 3 . The parameter 7 in Criterion $ 7 is determined by a 
two-step procedure [7]: The initial guess for 7 is set to 5, and then it is automatically adjusted according 
to the estimate noise level. 

4.1 H X -TV model 

Example 1. Let £(f) = X|t|o(l + cos if); an d ^ e kernel k is given by k(s,t) = ((s — t). The exact 
solution x^ is shown in Fig. [7J and the integration interval is [—6,6]. The solution x^ exhibits both flat 
and smoothly varying regions, and thus we adopt two penalties tpi( x ) — h\ x \%i an ^ ^(^O = Mtv for 
preserving their distinct features. The size of the problem is 100. 



10 



Table 1: Numerical results for Example [T] 



e r 1o Vhi Vtv eb eo ehi e tv 

5e-2 (3.44e-3,5.75e-3) (2.36e-4,2.14e-3) 5.68e-4 9.27e-3 3.31e-2 2.66e-2 3.97e-2 1.07e-l 
5e-3 (1.03e-4,1.83e-4) (2.19e-5,3.70e-4) 6.81e-5 4.85e-4 2.27e-2 1.10e-2 2.69e-2 9.48e-2 
5e-4 (3.32e-6,6.12e-6) (2.89e-6,5.07e-5) 1.26e-6 6.08e-5 1.25e-2 8.85e-3 1.38e-2 4.48e-2 
5e-5 (1.07e-7,2.04e-7) (7.04e-8,5.23e-6) 1.14e-7 4.06e-6 6.82e-3 5.53e-3 9.40e-3 1.68e-2 
5e-6 (3.01e-9,5.77e-9) (2.06e-10,6.65e-9) 6.01e-10 2.24e-7 4.50e-3 2.89e-3 5.28e-3 5.12e-3 




H lj TV sol. with rj h H 1 sol. with rj^i TV sol. with rjt v 



Figure 1: Numerical results for Example [T] with 5% noise. 

The numerical results are summarized in Table [l] In the table, the subscripts b and o refer to 
the balancing principle and the optimal choice, i.e., the value giving the smallest reconstruction error, 
respectively. The results for single-parameter models are indicated by subscripts hi and tv, and the 
respective penalty parameter shown in Tablc[T]is the optimal one. The accuracy of the results is measured 
by the relative L? error e = ||x — A first observation is that the error eb, by the balancing 

principle for the proposed model TL X -TV is smaller than the optimal choice for either H 1 or TV penalty. 
This illustrates clearly the benefit of using multi-parameter model. Interestingly, the balancing principle 
gives an error fairly close to the optimal one, and the error decreases as the noise level decreases. 

The numerical results for Example 1 with e — 5% noise is shown in Fig. [TJ In particular, the classical 
H 1 smoothness penalty fails to restore the flat region satisfactorily, whereas the TV approach suffers 
from stair-case effect in the gray region and reduced magnitude in the flat region, see Fig. [T] In contrast, 
the proposed H 1 -TV model can preserve the magnitude of flat region while reconstruct the gray region 
excellently. Therefore, it indeed combines the strengths of both H 1 and TV models, and is suitable for 
restoring images with both flat and gray regions. The criterion $ 7 is numerically well-behaved: there is 




$7(77) convergence of Algorithm II 

Figure 2: Numerical results for Example [I] with 5% noise. 
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t t t 

I 1 -!? sol. with r/ h I 1 sol. with rjn i 2 sol. with r]\2 

Figure 3: Numerical results for Example [2] with 5% noise. 



0.012 




$ 7 (?7) convergence of Algorithm II 



Figure 4: Numerical results for Example [2] with 5% noise. 

a distinct local minimum, and it is numerically easy to minimize, see Fig. [2j Finally, we would like to 
remark that the algorithm converge rapidly with the convergence achieved in five iterations, see Fig. [2j 

4.2 i l -i 2 model 

_ 3 

Example 2. The kernel k is given by k(s, t) = ^ (yq + (s — £) 2 ) 2 , the exact solution consists of two 
bumps and it is shown in Fig. 51 The penalties are tpi(x) = \\x\\^i and ip2(x) = jll^ll^ to retrieve the 
groupwise sparsity structure. The integration interval is [0, 1]. The size of the problem is 100. 

The numerical results for this example are show in Table [2] and Fig. [3j Here we are interested in 
the group structure of the solution with minimal number of influencing factors (nonzero coefficients). 
Again, we observe that the elastic-net compares favorably with the conventional i 1 and £ 2 penalties 
in terms of the error, and the balancing principle can give reasonable estimate for the optimal choice. 
The conventional £ 2 solution contains almost no zero entries, and thus it fails to distinguish between 



Table 2: Numerical results for Example [2| 



e 


*7b 




Vn 


Vl2 


eb 


e G 


en 


ei2 


5e-2 


(2.75c-3,1.09e-2) 


(3.16e-3,1.32e-3) 


2.96e0 


3.34e-3 


4.18e-l 


8.72e-2 


1.04e0 


4.59e-l 


5e-3 


(9.16e-5,2.86e-4) 


(2.46e-4,1.07e-4) 


1.03e-4 


3.06e-5 


2.09e-l 


1.24e-2 


8.97e-l 


2.90e-l 


5e-4 


(2.82e-6,7.48e-6) 


(2.34e-5,1.14e-5) 


1.30e-5 


4.08e-6 


5.76e-2 


7.98e-3 


6.18e-l 


2.17e-l 


5e-5 


(8.89e-8,2.26e-7) 


(2.27e-6,1.06e-6) 


1.24e-6 


3.84e-8 


1.57e-2 


4.71e-3 


4.85e-l 


1.66e-l 


5e-6 


(2.79e-9,7.07e-9) 


(1.66e-7,1.03e-7) 


4.12e-9 


1.41e-9 


1.27e-2 


2.27e-3 


2.61e-l 


9.55e-2 
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I 1 -I 2 sol, with r) a =(1.25e-2,1.29e-3) 




l x -i 2 sol. with T7 b =(1.14e-3, 1.12e-3) t 1 sol. with 77n=5.30e-l I 2 sol. with n l2 =3.31e-3 

Figure 5: Numerical results for Example [3] with 1% noise. 

influencing and noninfluencing coefficients, i.e., identifying relevant factors. This difficulty is partially 
remedied by the i l model in that many entries of the i 1 solution are zero. Therefore, some relevant 
factors are correctly identified. However, it tends to select only some instead of all relevant factors, i.e., 
group structure. The elastic-net model combines the best of both i 1 and £ 2 models, and it achieves the 
desired goal of identifying the group structure. Moreover, the magnitude assigned to the coefficients are 
reasonable compared to others. The algorithm converges quickly within five iterations. 

4.3 2D image deblurring 

Example 3. The penalties are ipi(x) = \\x\\gi and ip2{x) = ^IMI^- The kernel k performs standard 
Gaussian blur with standard deviation 1 and blurring width 5. The exact solution x^ is shown in Fig. [5| 
The size of the image is 50 x 50. 

This example showcases a more realistic problem of image deblurring. Here one half of the data points 
are retained. The reconstructions for 1% noise are shown in Fig. [5] The I 1 solution is more spiky, and 
neighboring pixels more or less act independently In particular, due to missing data, there are some 
missing pixels in the blocks and the cross to be recovered. In contrast, the £ 2 solution is more blockwise, 
but there are many nonzero coefficients indicated by the small spurious oscillations in the background. 
The elastic-net model achieves the best of the two: retaining the block structure with only fewer spurious 
nonzero coefficients. This is deemed important in medical imaging, e.g., classification. The numbers are 
more telling: e b = 2.99 x 10" 1 , e Q = 2.44 x 10 _1 , e n = 9.21 x 10" 1 , and e 12 = 3.42 x 1GT 1 . Therefore, the 
error eb for elastic-net agrees well with the optimal choice, and it is smaller than that with the optimal 
choices for both i 1 and I 2 models. 



5 Concluding remarks 

We have studied theoretical properties of multi-parameter Tikhonov regularization. Some properties, 
e.g., monotonicity, concavity, asymptotic and differentiability, of the value function, were established. 
The discrepancy principle is partially justified in terms of consistency and convergence rates, however, 
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the regularization parameter is not uniquely determined, which partially limits its practical applica- 
tion. It is of interest to develop auxiliary rules, which is currently under investigation. In contrast, the 
balancing principle allows justifications in terms of a posteriori error estimate and efficient numerical 
implementation. The numerical experiments show that multi-parameter models can significantly im- 
prove the reconstruction quality and the balancing principle can give reasonable results in comparison 
with the optimal choice in a computationally efficient way. The two proposed algorithms for computing 
the parameters of the balancing principle deliver excellent convergence behavior. However, a rigorous 
convergence analysis remains to be established. 
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